1,539 research outputs found

    Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

    Full text link
    This paper introduces an open-ended sequential algorithm for computing the p-value of a test using Monte Carlo simulation. It guarantees that the resampling risk, the probability of a different decision than the one based on the theoretical p-value, is uniformly bounded by an arbitrarily small constant. Previously suggested sequential or non-sequential algorithms, using a bounded sample size, do not have this property. Although the algorithm is open-ended, the expected number of steps is finite, except when the p-value is on the threshold between rejecting and not rejecting. The algorithm is suitable as standard for implementing tests that require (re-)sampling. It can also be used in other situations: to check whether a test is conservative, iteratively to implement double bootstrap tests, and to determine the sample size required for a certain power.Comment: Major Revision 15 pages, 4 figure

    Composite Correlation Quantization for Efficient Multimodal Retrieval

    Full text link
    Efficient similarity retrieval from large-scale multimodal database is pervasive in modern search engines and social networks. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. While hashing methods have shown great potential in achieving this goal, current attempts generally fail to learn isomorphic hash codes in a seamless scheme, that is, they embed multiple modalities in a continuous isomorphic space and separately threshold embeddings into binary codes, which incurs substantial loss of retrieval accuracy. In this paper, we approach seamless multimodal hashing by proposing a novel Composite Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds correlation-maximal mappings that transform different modalities into isomorphic latent space, and learns composite quantizers that convert the isomorphic latent features into compact binary codes. An optimization framework is devised to preserve both intra-modal similarity and inter-modal correlation through minimizing both reconstruction and quantization errors, which can be trained from both paired and partially paired data in linear time. A comprehensive set of experiments clearly show the superior effectiveness and efficiency of CCQ against the state of the art hashing methods for both unimodal and cross-modal retrieval

    Energy-based temporal neural networks for imputing missing values

    Get PDF
    Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset

    Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks

    Get PDF
    Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.Comment: 26 pages, 2 figures, accepted in Journal of Computational and Graphical Statistics (http://www.amstat.org/publications/jcgs.cfm

    A Bayesian reassessment of nearest-neighbour classification

    Get PDF
    The k-nearest-neighbour procedure is a well-known deterministic method used in supervised classification. This paper proposes a reassessment of this approach as a statistical technique derived from a proper probabilistic model; in particular, we modify the assessment made in a previous analysis of this method undertaken by Holmes and Adams (2002,2003), and evaluated by Manocha and Girolami (2007), where the underlying probabilistic model is not completely well-defined. Once a clear probabilistic basis for the k-nearest-neighbour procedure is established, we derive computational tools for conducting Bayesian inference on the parameters of the corresponding model. In particular, we assess the difficulties inherent to pseudo-likelihood and to path sampling approximations of an intractable normalising constant, and propose a perfect sampling strategy to implement a correct MCMC sampler associated with our model. If perfect sampling is not available, we suggest using a Gibbs sampling approximation. Illustrations of the performance of the corresponding Bayesian classifier are provided for several benchmark datasets, demonstrating in particular the limitations of the pseudo-likelihood approximation in this set-up

    The statistical mechanics of networks

    Full text link
    We study the family of network models derived by requiring the expected properties of a graph ensemble to match a given set of measurements of a real-world network, while maximizing the entropy of the ensemble. Models of this type play the same role in the study of networks as is played by the Boltzmann distribution in classical statistical mechanics; they offer the best prediction of network properties subject to the constraints imposed by a given set of observations. We give exact solutions of models within this class that incorporate arbitrary degree distributions and arbitrary but independent edge probabilities. We also discuss some more complex examples with correlated edges that can be solved approximately or exactly by adapting various familiar methods, including mean-field theory, perturbation theory, and saddle-point expansions.Comment: 15 pages, 4 figure

    Solution of the 2-star model of a network

    Full text link
    The p-star model or exponential random graph is among the oldest and best-known of network models. Here we give an analytic solution for the particular case of the 2-star model, which is one of the most fundamental of exponential random graphs. We derive expressions for a number of quantities of interest in the model and show that the degenerate region of the parameter space observed in computer simulations is a spontaneously symmetry broken phase separated from the normal phase of the model by a conventional continuous phase transition.Comment: 5 pages, 3 figure

    Creation and characterization of vortex clusters in atomic Bose-Einstein condensates

    Full text link
    We show that a moving obstacle, in the form of an elongated paddle, can create vortices that are dispersed, or induce clusters of like-signed vortices in 2D Bose-Einstein condensates. We propose new statistical measures of clustering based on Ripley's K-function which are suitable to the small size and small number of vortices in atomic condensates, which lack the huge number of length scales excited in larger classical and quantum turbulent fluid systems. The evolution and decay of clustering is analyzed using these measures. Experimentally it should prove possible to create such an obstacle by a laser beam and a moving optical mask. The theoretical techniques we present are accessible to experimentalists and extend the current methods available to induce 2D quantum turbulence in Bose-Einstein condensates.Comment: 9 pages, 9 figure

    Phase I–II trial design for biologic agents using conditional auto‐regressive models for toxicity and efficacy

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147824/1/rssc12314_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147824/2/rssc12314.pd
    corecore